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We present a modification of tlie so-called Parrondo's paradox where one is allowed to choose in 
each turn the game that a large number of individuals play. It turns out that, by choosing the game 
which gives the highest average earnings at each step, one ends up with systematic loses, whereas 
a periodic or random sequence of choices yields a steadily increase of the capital. An explanation 
of this behavior is given by noting that the short-range maximization of the returns is "killing the 
goose that laid the golden eggs". A continuous model displaying similar features is analyzed using 
dynamic programming techniques from control theory. 
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The physics of Brownian motors has recently inspired 
the discovery of a counterintuitive phenomenon in gam- 
bUng games, which is attracting considerable attention. 
Ajdari and Prost showed that a one dimensional Brow- 
nian particle in a flashing asymmetric potential experi- 
ments a net motion in a given direction Q . The motion 
persists even against a small force. Consequently, one 
can have the following startling situation: the particle 
moves in the direction of the force if the potential is on 
or if it is off, whereas it moves in the opposite direction if 
the potential is flashing. The translation of the dynamics 
of this Brownian particle to gambling games constitutes 
the so-called Parrondo 's paradox |^ : two losing games 
yield, when alternated, a winning game. The effect is 
obtained with two games, A and B, that mimic the be- 
havior of the Brownian particle in a flat and a ratchet 
potential, respectively. 

In game A, the capital X{t) of the player increases 
by one unit with probability pi — 1/2 — e, where e is 
a small positive real number, and decreases by one unit 
with probability 1 — pi. In the following, we will interpret 
the game as a bet on the toss of a slightly biased coin. 

Game B is played with two coins depending on X{t): 
if X{t) is not a multiple of three, we use coin 2, with a 
probability to win p2 = 3/4 — e and a probability to lose 
1 — P2; if X{t) is a multiple of three, we use coin 3, with 
a probability to win p^ — 1/10 — e and a probability to 
lose 1 — P3. It can be proved that the combination of the 
"good" coin 2 and the "bad" coin 3 yields a fair game 
when 6 = 0, and a losing game when e > [||. By fair, 
losing and winning here we mean that the average capital 
{X{t)) is a constant, decreasing or increasing function of 
t, respectively. 

We then have two games, A and B, which are fair (los- 
ing) if e = (e > 0). The aforementioned counterintu- 
itive effect is that the alternation of A and B, in some 
given random or periodic sequences, is a winning game. 

The phenomenon indicates that the alternation of 
stochastic dynamics can result in a behavior which dif- 
fers qualitatively from that exhibited by each of the 
dynamics, and therefore, could in principle be relevant 



in a variety of situations, ranging from economics to 
physics, where the constrains or the dynamics of a system 
switches between two arrangements 

However, the paradox loses all its interest if one is al- 
lowed to choose the game to play in each turn. In this 
case, the trivial strategy is to choose A when X{t) is a 
multiple of three and B otherwise. The resulting game 
is clearly winning and this strategy performs better than 
any periodic or random alternation of games A and B. 
We could call these latter strategies "blind" , since they 
do not use any information about the state of the system. 

In a similar way, if one has some information about the 
position of the particle in a flashing ratchet, it is possible 
to switch on and off the potential in such a way that 
energy is extracted from a single thermal bath. This is 
nothing but a Maxwell demon [|| . 

For a Brownian particle, it is known that the acquisi- 
tion of this information, or its subsequent erasure from 
a memory device, has some unavoidable entropy cost [|| , 
which prevents any violation of the Second Law of Ther- 
modynamics. On the other hand, in other contexts, like 
economics, there is no such limitations and it is unlikely 
that blind strategies could be of any interest. 

However, the model that we present in this Letter 
shows that this is not the case. It is a modification of 
the original Parrondo's paradox in which blind strategies 
are winning whereas a strategy which chooses the game 
with the highest average return is losing. Moreover, we 
will identify the mechanism underlying this counterintu- 
itive behavior and show with a second model that it can 
also appear in simple deterministic systems. 

The two models presented are also of interest in control 
theory. The choice of a strategy maximizing some quan- 
tity is a problem widely treated by optimal control the- 
ory, which has been proven a powerful tool in a number 
of disciplines including engineering, physics, chemistry, 
economics, social sciences, medicine and biology ^, 0. 
The counterintuitive phenomenon discussed in this Let- 
ter, up to our knowledge, has not been reported before 
and can be relevant in optimization problems involving 
dynamical systems. 
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FIG. 1: Evolution of the average money of an infinity number 
of players for 7 = 0.675 and e = 0.005 and the three strategies 
discussed in the text. 



with TToc = {pi -P2)/{P3 -P2) = 5/13. 

Let us focus now on the behavior of 7ro(t) for e = 0. On 
one hand, TTo{t) tends to 1/3 if A is played a large number 
of turns, because, under the rules of A, the capital X{t) 
is a symmetric and homogenous random walk. On the 
other hand, if B is played repeatedly, 7ro(t) tends to 5/13, 
i.e., to ttqc- This can be proved by analyzing game B as 
a Markov chain |^ . Notice also that this coincidence was 
expected since B is fair game for e = and ttqc has been 
obtained by solving PwinB = PwinA = 1/2. 

Figure |2| represents schematically the evolution of 7ro(t) 
under the action of each game, as well as the prescription 
of the SR optimal strategy given by Eq. (|2|). Now we 
are ready to explain why the SR optimal strategy yields 
worse results than the periodic and random sequences. 



SR optimal strategy: 



The first model consists of a large number N of players. 
At each turn of the game, a fraction 7 of these players is 
randomly selected. We are told how much money every 
player has and we are then allowed to choose a game, 
A or B, which will be played by all jN players in the 
subset. Our goal is to choose each turn between A or B 
in order to maximize the average earnings of the players. 
We consider three different strategies: 

• Periodic strategy: the game is selected by following 
a given periodic sequence, for example ABB ABB. . . 

• Random strategy: the game is chosen randomly 
with equal probability for both A and B. 

• Short-range (SR) optimal strategy: the game that 
will yield the highest average return is chosen. 

As we will see below, the third strategy makes use of 
the available information whereas the periodic and ran- 
dom strategies are blind, in the sense defined above. Sur- 
prisingly, these blind strategies produce a systematic win- 
ning whereas the SR optimal strategy is losing, as it is 
shown in figure ^ 

A detailed analysis of our model will reveal the under- 
lying mechanism causing this unexpected phenomenon. 
The key magnitude for this analysis is ttq (t) , the fraction 
of players whose money is a multiple of three in turn 
t. From 7ro(i), it is not difficult to calculate the average 
fraction of players that would win in each game: 



PwinA = Pi 

PwinB = TToPa + (1 - 7ro)p2 



(1) 



The SR optimal strategy chooses the game which gives 
the highest return in one turn. Comparing pwinA and 
PwinB we get the following prescription: 



play A if 7ro(t) > ttqc 
play B if 7ro(t) < ttqc 



PlayB 



Play A 



1/3 



5/13 



(2) 



FIG. 2: Schematic representation of the evolution of 7ro(t) 
under the action of game A and game B. The prescription of 
the SR optimal strategy is also represented. 



Consider an initial distribution of the capital such that 
7ro(0) < ttqc — 5/13. The SR optimal strategy chooses 
B and, consequently, ttq increases. If 7ro(l) is still under 
5/13 (and this is the case for 7 small enough), the SR 
optimal strategy chooses B again. We see that, as far 
as 7ro(t) does not exceed 5/13, the SR optimal strategy 
chooses B in every turn. However, this choice, although 
it is the one which gives the highest returns in each turn, 
drives 7ro(t) towards 5/13, i.e., towards values of 7ro(t) 
where the gain is small. For instance, if 7 = 1/2, the SR 
optimal strategy chooses B forty times in a row before 
switching to game A. This will make ttq approximately 
equal to ttqc — 5/13 at almost every turn, as can be seen 
in figure 1^. Figure |4| shows that, as long as 7ro(i) is close to 
ttqc, the average capital remains approximately constant. 

On the other hand, the random and the periodic strate- 
gies choose game A even when ttq < ttqc- This will 
not produce earnings in this specific turn, but will take 
ttq away from ttqc and make the corresponding average 
money grow faster than that for the SR optimal strat- 
egy, as can be seen in the figure. 

In other words, the SR optimal strategy, by choosing 
B too many times, is "killing the goose that laid the 
golden eggs", and to perform better than this strategy 
one must sacrifice short-term profits for higher returns 
in the future, as the blind strategies considered here do. 

The introduction of e has two consequences: it turns 
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FIG. 3: Evolution of ttq for A'' = cxd, 7 = 0.5, e = and the 
three different strategies. The arrows show the turns where 
the short-range optimal strategy chooses game A. 



FIG. 4: Average of the capital of thejplayers corresponding to 
the evolution of ttq depicted in figure H. The steps in the curve 
corresponding to the short-range optimal strategy coincide 
with the turns where A is selected (see Fig. 



A and B into losing games when played alone, and it 
decreases the stationary value for game B, which now is 
below ttqc = 5/13 (the critical value ttqc does not change). 
Due to the latter, the system may get trapped playing 
game B forever when following the SR optimal strategy 
and, since B is now a losing game, the average money 
will decrease. This is, for instance, the situation already 
presented in figure |l| for e = 0.005 and 7 = 0.675. 

Now we present a continuous and deterministic model 
which displays some of the features of the previous one. 
Consider the following dynamical system: 



m = a{t)x{t) 

i{t) = -- [x(t) - Xf,{l - a{t))] 



(3) 



with a{t) = or 1. Our task is to find a{t) that maxi- 
mizes y(T). These equations are a rather generic model 
of a system which produces some output like, for in- 
stance, a production plant. y{t) is the total cumulative 
output of the plant up to time t. We can decide to switch 
on and off the plant at every time t by setting a(t) = 1 
or a{t) = 0, respectively. Finally, x{t) is the productiv- 
ity of the plant, which decreases exponentially when the 
plant is working and goes back to its full capacity value 
Xfc when the plant is off, r being the characteristic time 
of these relaxations. If we are allowed to use the plant up 
to a time T, the problem is to find the protocol or policy 
a{t) maximizing the total output y{T). 

A naive approach to the problem consists of maximiz- 
ing y{t) at every time t: 



a{t) = if < 
a{t) = 1 iix{t) > 



(4) 



However, it is not hard to see that this set up will keep 
y{t) = y{0) for all t if initially the productivity x{0) is 



negative, or make y{t) tend to y(0) -|-Ta;(0) exponentially 
if x(0) > 0. In either case, y{t) will attain a saturation 
value. The criterion (^ prescribes making the plant work 
whenever the productivity is positive, and this is equiv- 
alent to the short-range optimal strategy in our previous 
model, which dictated to play game B if 7ro(t) < ttqc- 
If the productivity is positive, we of course get more 
when the plant is working, but then we also get a de- 
crease of the productivity which will end up exhausting 
the system. With the prescription given by (^) we are 
also killing the goose that laid the golden eggs, and it is 
again possible to do better by letting the plant rest even 
when the productivity x{t) is positive. 

Surprisingly enough and despite the linearity and sim- 
plicity of our system, the precise optimal policy a{t) is 
not easy to find. Some of the techniques provided by con- 
trol theory fail in this case, such as the Euler-Lagrange 
equations and the Pontryagin principle |6| . The only way 
to completely solve the problem is to discretize it and ap- 
ply the so-called Bellman's Optimality Criterion The 
optimal choice of a{t) in terms of the state x{t) happens 
to be: 



ait) 



1 if x{t) < x'^it) 
ifx{t)>x''(t) 



(5) 



where x'^{t) is a critical value for the productivity which 
can be calculated by solving recurrence equations |^ . In 
figure H we show x'^{t) for r = Xfc = 1 and T = 2, as 
well as the behavior of y{t) and x{t). We see that this 
optimal policy achieves a steadily increase of the output 
y{t). Fig. H shows a numerical computation of x'^{t) for 
T — 2, 3, and 4, where we can see that x'^{t) — 0.5 until 
the final part of the total time interval [0,T]. 

Consequently, the behavior of the optimal policy a{t) is 
as follows. There is a first stage in which the productivity 
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FIG. 5: The cumulative output y{t), the productivity x{t), 
and the critical value x'^{t) for the discrete version of the sys- 
tem (||), r = 2 and t = xi^ = 1. 
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FIG. 6: Numerical computation of x'^{t), for T = 2, 3 and 4, 
and T = Xic = 1. 



x{t) goes to 0.5 by setting a = 1 if a;(0) < 0.5 and a = 1 if 
x{0) > 0.5. Once x{t) reaches the value 0.5, the optimal 
poHcy prescribes very rapid changes between a ~ and 
a = 1 which keep x{t) ~ 0.5. Finally, in the last part of 
the interval [0, T], when x'^{t) starts to grow, the optimal 
policy chooses a{t) = 1. 

The first two stages can be easily explained. Assuming 
a(t) G [0, 1] and constant, x{t) reaches a stationary value 
Xst = a;fc(l — a). This gives the following stationary slope 
for the cumulative output y{t): 

y{t) = axst = a;fc [a{l - a)] (6) 

which is maximum for a = 0.5 and Xst — a;fc/2. There- 
fore, the optimal policy should try to drive x{t) to Xfc/2 
and keep it there, i.e., should try to make the plant work 
at half of its full capacity. In our case, a{t) can only 
take values and 1. However, by rapid oscillations one 
can get an effective value of a{t) equal to any real num- 
ber between and 1. The optimal policy implies a rapid 



variation which gives an effective value a = 0.5, yielding 
a slope for y{t) which is y{t) = xtc/4, definitely better 
than the short-range optimization of y{t) which gave us 
a horizontal slope. 

The final stage of the optimal policy a{t) has a clear 
intuitive explanation. We have to abandon the plant at 
time T. Therefore, the optimal policy when t is ap- 
proaching T should set a{t) — 1, since we want to get 
as much as possible and do not care if we leave the 
plant exhausted after T . This is exactly what happens 
to a middle-distance runner: she keeps a constant ve- 
locity which allows her to maintain a stationary regime 
but she sprints in the last meters of the race to use 
up all her strength. With this picture in mind, we 
call this last stage the "sprint". One can calculate the 
duration of the sprint, isprint, in our model, assuming 
that x(0) = xic/2, a{t) = 1/2 for i < T - tsprint and 
a{t) — 1 iov t > T - ^sprint- Eq. (||) can then be fully 
solved and it is found that y{T) reaches its maximum for 
^sprint = (ln2)T ~ 0.693 r, in agreement with the curves 
in Fig. |. 

In conclusion, we have presented a stochastic model 
in which a short-range optimization yields to systematic 
loses, whereas blind strategies steadily win. We have 
found an explanation of this phenomenon based on the 
fact that the short-range optimal strategy is "killing the 
goose that laid the golden egg" , and proven that the same 
mechanism can also arise in a linear deterministic system. 
In fact, similar mechanisms has been widely reported in 
the realm of economics and ecology, although mainly de- 
scribed qualitatively. The risks of overtaxing commerce 
and the overuse of natural resources are representative 
cases. We believe that the models presented here could 
inspire new quantitative approaches to these problems. 
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